Semantic Management of Deduplicate Tuples in the Relational Databases
نویسندگان
چکیده
Relational database is a collection of relations. Duplicate tuple existence is common in many real time relational databases. In a relational database, if the same real-world entity is represented by more than one tuple, then such tuples are called duplicate tuples. Finding duplicate tuples and then replacing them by one best tuple is called a fusion operation. Whenever duplicate tuples are found in the relations of any database, those tuples must be replaced with one special best approximate tuple that represents the joint information of all the duplicate tuples. Present study proposes new techniques to find duplicate tuples and then remove those duplicate tuples with the correct real world tuples. In the first step duplicate tuples in the relation are classified based on the class label and in the second step then for each set of duplicate tuples functional dependency method or union method is applied to replace duplicate tuples with the corresponding correct real world single tuple. One possibility is to replace one set of duplicate tuples with one correct real world tuple. Another possibility is to replace two or more sets of duplicate tuples in the relation by one set of correct real world tuples. Sometimes the removal of duplicate tuples in the relations of any relational database can create referential integrity violations. All such violations must be controlled and coordinated syntactically as well as semantically in relations.
منابع مشابه
A Semantic Caching Method Based on Linear Constraints
Because performance is a crucial issue in database systems, data caching techniques have been studied in database research field, especially in client-server databases and distributed databases. Recently, the idea of semantic caching has been proposed. The approach uses semantic information to describe cached data items so that it tries to exploit not only temporal locality but also semantic lo...
متن کاملReverse Engineering of Relational Databases to Ontologies: An Approach Based on an Analysis of HTML Forms
We propose a novel approach to reverse engineering of relational databases to ontologies. Our approach is based on the idea that semantics of a relational database can be inferred, without an explicit analysis of relational schema, tuples and user queries. Rather, these semantics can be extracted by analyzing HTML forms, which are the most popular interface to communicate with relational databa...
متن کاملFinding Hidden Structures in Relational Databases
Relational database management systems have been widely used over decades. An important research issue is to find hidden structural information in large relational databases. By hidden structural information we mean the information that cannot be easily found using a traditional query language SQL. In this talk, we discuss how to find hidden structural information in a relational database by vi...
متن کاملA Conditional Model of Deduplication for Multi-Type Relational Data
Record deduplication is the task of merging database records that refer to the same underlying entity. In relational databases, accurate deduplication for records of one type is often dependent on the merge decisions made for records of other types. Whereas nearly all previous approaches have merged records of different types independently, this work models these inter-dependencies explicitly t...
متن کاملSemantic sampling of existing databases through informative Armstrong databases
Functional dependencies (FDs) and inclusion dependencies (INDs) convey most of data semantics in relational databases and are very useful in practice since they generalize keys and foreign keys. Nevertheless, FDs and INDs are often not available, obsolete or lost in real-life databases. Several algorithms have been proposed for mining these dependencies, but the output is always in the same for...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2016